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Sample variance and Lyman-a forest transmission statistics 



(N 

o 

(N 
D 

m 

(N 



o 

u 

Of 

6 



> 

00 

o 
in 

ON 

o 

(N 



E. Rollinde 1 , T. Theuns 2 ' 3 , J. Schaye 4 , 1. Paris 1 , P. Petitjean 1 

1 UPMC University Paris 06, UMR7095, Institut dAstrophysique de Paris, F-75014, Paris, France 

2 Institute of Computational Cosmology, Department of Physics, University of Durham, 
Science Laboratories, South Road, Durham DHI3LE 

3 Universiteit Antwerpen, Campus Groenenborger, Groenenborgerlaan 171, B-2020 Antwerpen, Belgium 

4 Leiden Observatory, Leiden University P.O. Box 9513, 2300 RA Leiden, The Netherlands 



25 September 2012 



ABSTRACT 

We compare the observed probability distribution function of the transmission in the H I 
Lyman-a: forest, measured from the UVES 'Large Programme' sample at redshifts z = 
[2,2.5,3], to results from the GIMIC cosmological simulations. Our measured values for 
the mean transmission and its PDF are in good agreement with published results. Errors on 
statistics measured from high-resolution data are typically estimated using bootstrap or jack- 
knife resampling techniques after splitting the spectra into chunks. We demonstrate that these 
methods tend to underestimate the sample variance unless the chunk size is much larger than 
is commonly the case. We therefore estimate the sample variance from the simulations. We 
conclude that observed and simulated transmission statistics are in good agreement, in partic- 
ular, we do not require the temperature-density relation to be 'inverted' . 

Key words: cosmology: theory — methods: numerical — galaxies: intergalactic medium 



1 INTRODUCTION 

At high redshift, the intergalactic medi um (IGM) contains 

the majority of bary ons in the Universe l lPetitiean et alj 1 19931 ; 

iFukugita et alj 1 19981) . is highly ionised by the UV-background 

(UVB) produced by galaxies and QSOs dGunn & Petersonll 19651) 

L* 1 at least since redshift z ~ 6 jFan et al.|[200d: iBecker et al.ll2007h 

. , becoming increasingly neutral near z ~ 7l lMortlocketalj201lh . It 

■ is dete cted in absor ption against brigh t sources as the H I Lyman-a 

' forest dLvnddll97lh : see lRauchl i 19981) for a review. 

High signal-to-noise observations with high-resolution, 

echelle spectrographs such as the Ultraviolet and Visual Echelle 

Spectrograph (UVE S) on the Very Large Teles cope (VLT, e.g . 

iBergeron et aI1l2004l) and HIRES on Keck (e.g. iHu et alj[l995h . 

of this forest of HI absorption lines, together with numerical 

i it f | H\ — | | ; 1 

simulations dCen et al. 1994; Petitjean et al. 1995; Hernqui st et all 

ll996l:IZhang et alJl995l:lTheuns et ail 19981) and theoretical models 
i Bi etalj |l992; Schayej |200ll) have painted a picture in which low 
column-density HI absorption lines trace the filaments of the 'cos- 
mic web', and high column-density absorption lines trace the sur- 
roundings of galaxies. Simulations that include self-shielding of the 
UVB reproduce the observed column density distribution over 10 
orders of magnitude dAitavetalj201ll) . 

In this paradigm, the IGM as probed by the Lyman-a for- 
est consists of mildly non-linear gas density fluctuations. The gas 
traces the dark matter, and is photo-ionised and photo-heated by 
the UV-bac kgroun d. Although metals are de tected in the IGM 
1995), even at low densities (e.g. lSchave et al.f 2003; 
stirring of the IGM due to feedback from 



galaxies or AGN is probabl y not strongly affec t ing the vast ma- 



Theuns 



ongly attecting the vast ma- 
et all l2002t iMcDonald etHI 



jority of the baryons (e.g. 

120051) . This makes it possible to use Lyman-a observations to 
constrain cosmological parameter s ( McDonald & Miralda-Escude 
1999 |: lRollinde et al.l2003l ; IViel & Haehneltll2006l ; lMcDonald et"ai] 
2006J), as well as to probe the densi t y distribution around 
quasars and galaxi es dRollinde et al. I l2005l ; Guimaraes et al. I l2007l : 



iKim & Croftl2008h . 



Photo-heating of the low-density IGM introduces a near- 
power law relation between its temperature, T, and density, p, 
of the form T = T A^ " 1 , where A = p/(p) faui & Gnedinl 
1 19971 ; ITheuns et all 1 19981). The evolution of T and 7 have been 
meas u red dSchave et all 2000l: iRicotti et al.ll200d;lMcDonald et alj 



Becker et al 



200 it iLidz et all 1200ft IBecker et alj 120071: \Udz et all l20ld: 



201 lh , and depends on the r e-ionization history (e.g. 



Theuns et alj|2002l ; iHui & H aiman 2003|) and the hardness of the 



( Cowie et al 



lAracil et all 



2004 . 



UV-background. When the gas is strongly photo-heated after the 
re-ionization of HI and Hell, To increases and the gas becomes 
nearly isothermal, 7 — > 1; asymptotically the balance between 
photo-heating and adiabatic cooling results in T = TpA 1 ^ 17 
and a slowly decr easing To with redshift dHui &Gnedin|[l997l : 
ITheuns et alJI 19981) . The amplitude of the optically thin ionising 
background rate (Fi2), the temperature of the IGM (characterised 
by Tb and 7), and the amplitude of fluctuations (ag) together deter- 
mine t he net amount of absorption 1 Rauch et al |l997j: ITheuns et alj 
20021: IHui & Haimanl 120031: iBolton et alj 120051: iFan et all 120061 : 



Faucher-Giguere et alj|2008l) . and the value inferred by compar- 
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ing to sim ulations is very close to t hat computed by summing over 
sources bv lHaardt & Madaul j200lh . 

It is also possible to compare the full probability distribu- 
tion function of the transmission (TPDF) between simulations and 
data, which could provide a more accurat e characterisation o f the 
UVB. Such an an alysis was performed bv lBolton et alj d2008t) and 
IViel et alj d2009l) . who compared TPDFs computed from simula- 
tions to those measured from a large sample of high-resolution 
UVES spectra dKim et al]|2007h . They performed a standard \ 2 
analysis and suggested that an 'inverted' T — p relation, 7 < 1, 
ma y be required to fit t he data. A similar conclusion was reached 
by rBecker et al.l d2007l) using Keck data and different theoreti- 
cal optical depth distributions. ICalura et~aT I d2012h have done the 
same analysis with additional quasars at z ~ 3. Their new analy- 
sis favours a value of 7 that is larger than what they found before, 
but is still slightly lower than one. From a theory point of view it 
is difficult to understand how an inverted temperature-density rela- 
tion might arise: simulations that include spectral hardening com- 
puted with a full radiative transfer calculation fe.g. lMcOuinn et alj 
120091 : lBoltonetalJl2009l) do not result in 7 < 1. If the IGM's 
T — p relation were indeed inverted, there may be missing physics 
in simulations of the Ly man-q forest (such as the impact of blazar s 
as studied recently by IChang etalj|201ll ; IPuchwein et all 1201 ll) . 
which may i mpact other statistics such as the Lyman-a power 
spectrum (e.g. lMcOuinn et alj|201lh and cosmological constraint s 
derived from that (e.g. iGratton et aTll2008l ; iBovarskv etal]|2009l) . 
Partly for this reason, Lyman-a forest constraints were not used by 
iKomatsu et alj J2009h in their determination of cosmological pa- 
rameters from WMAP and other data. 

However, there are both numerical and observational difficul- 
ties in the characterisatio n of the absorption. Numerical issues were 
investigated in a paper by |TvtleretaT]j2009h . who analysed the im- 
portance of large-scale modes in the determination of the TPDF in 
a numerical simulation. These authors showed that smaller simula- 
tion boxes predict, on average, more absorption for a given value of 
th e imposed ionising b ackground. The box size used in the analyses 
of lBolton et al. I d2008l) is 56 Mpc, which, according to lTvtler et alj 
(2009) (their Table 12), decreases the amplitude of the TPDF by 
1 to 5 per cent in the flux range used in the analysis (0.2 to 0.8) 
as compared to a bigger box of 76.8 Mpc. The differe nce could be 
up to 10 per cent for even larger simulations. Even so. lTvtler et alj 
d2009h also found that the predicted TPDF (with their box size of 
76.8 Mpc) differs fr om the observed on e, although to a lesser ex- 
tent than that seen bv lBolton et alj J2008I) . They did not consider an 
inverted temperature-density relation, but discussed other plausible 
sources for the discrepancy: the lack of high column density lines 
(logio A r Hi(cm _2 )> 14) in the simulation, unidentified metal lines, 
and the assumed mean flux values. Note that the la st two issues 
were d iscussed and, at least partly, accounted for in IBolton et alj 
J2008I) . 

However, a n additional limitation, not considered in 
iTvtler et all | |2009|) . is the relativ ely small number o f observed high- 
resolution spectra. For example, I Kim et all d2007h use a sample of 
just 18 spectra. In this paper we use both simulations and data to 
get a better handle on just how well such a relatively small sample 
of spectra determines the TPDF. 

We revisit the analysis of the transmission statistics in terms 
of its sample variance using four different observational determi- 
na tions descr i bed b elow in Section | 2.1I ( i ) the LUQAS sample 
of 
of 



Kim et all d2007l) used by IBolton et all d2008l) . (ii) the sample 



Calura et al] j2012l) that increases the number of quasar with 



z ~ 3, (Hi) a sample o f Keck spectra analysed and published by 
iMcDonald etal] d2000h . and finally (iv) a UVES sample collected 
in the context of the ESO Large Programme 'Cosmic Evolution of 
the IGM' (Bergeron et al. 2004). We demonstrate that published er- 
rors on the mean transmission are often too small, they do not fully 
account for sample variance. The observed TPDFs are compared 
to mock spectra com puted from a suite of hydrodynamical simula- 
tions called GIMIC dCrain et alj|2009l Section |23} that resolves 
both large and small scales by using 'zoomed' initial conditions. 
We generate many mock samples from GIMIC with the same red- 
shift path as the observed samples, and use this to investigate sam- 
ple variance in both the mean transmission and the transmission 
probability distribution. In particular, we show how strong lines, 
which are relatively rare, nevertheless have substantial impact on 
both the me an transmi s sion a nd its probability distribution, some- 
thing which [viel et alj d2004l) commented on in the context of the 
transmission power spectrum. Given the small redshift paths of the 
data, we conclude that observations and simulations are mutually 
consistent, because of the relatively 'large sample variance'. 



2 OBSERVED AND SIMULATED LYMAN-a SPECTRA 
2.1 Observed samples 

The transmission in the Lyman-a forest is the ratio F — F /C of 
the measured flux (F a ) over what the flux would be in the absence 
of absorption. Measuring F requires knowledge of the intrinsic flux 
of the quasar (C; the 'continuum'), and since we are only interested 
in absorption due to neutral hydrogen (HI Lyman-a, n — 1 — » 2, 
Ao = 1215.57 A), we also need to know the contribution to the 
absorption from other elements ('metals'). Neither the continuum 
nor the contribution from metals are easy to determine: the intrin- 
sic QSO spectrum contains broad emission lines and, moreover, 
the combination of a narrow slit with an echelle spectrograph - re- 
quired to obtain the high spectral resolution - means the spectra 
cannot be accurately flux calibrated. 'Continuum fitting' spectra to 
determine C then involves drawing a smooth curve connecting re- 
gions deemed free from absorption, a somewhat subjective proce- 
dure. Metal lines are eliminated by identifying lines too narrow to 
be due to hydrogen, or from line coincidences where a metal tran- 
sition occurs at the same redshift as a (strong) HI absorber or other 
metal transition. Finally, a 'proximity region', i.e. the region close 
to the quasar where it dominates the UV-background, is excised. 

Here we use four observational data sets to determine the 
mean transmission and its PDF, refer red to below as the LP sam- 
ple, the LUQAS sample, the sample o f lCaluraetal.ld2012l) , and the 
MOO sample. 

• The LP sample is from our own independent analysis of a set 
of 18 UVES VLT spectra, collected as part of the European South- 
ern Observatory's 'Large Programme' (LP) 'Co smic Evolution of 
the intergalactic medium' dBergeron et al .|[2004h . These LP spectra 
have a high-resolution (A/ A A ~ 45000) and a high signal-to-noise 
ratio (S/N~ 25 - 30 per pixel), and were re-binned on to 0.05 A 
pixels. Th e continuum was fi tted using an automatic method de- 
scribed in lAracil et al. I d2004l) , and metal lines were removed by 
eliminating contaminated regions. There are no damped Lyman-a 
absorbers in these lines of sight. We compute the TPDFs and the 
mean transmission over three relatively small redshift ranges, cen- 
tred at z ~ 2 (1.88 < z < 2.37), z ~ 2.5 (2.37 < z < 2.71) and 
z ~ 3 (2.71 < z < 3.21). The total number of data pixels in the 
LP spectra for each of the redshift bins is 139830, 65067 and 30800 
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(of which a fraction 74%, 85% and 100% are in common with the 
LUQAS sample described below). The corresponding absorption 
distance^] are AX = 10.5, 5.8 and 2.9 respectively. 



The LUQ AS sample used by iBolton et all d2008l) and 
IViel et alj d2009l) is described in detail bv lKim et al.l J2007h . includ- 



ing details of their method of continuum fitting and metal line iden- 
tification. They fit metal lines in the Lyman-Q part of the spectrum 
using VPFIT dCarswelleta"i"1ll987I) . then use this to reconstruct 
an HI spectrum without the identified metals, as in iTheuns et al.l 
d2002l) . We find that this method has a similar effect on the trans- 
mission distribution as the method we used. The LUQAS sample 
has 18 spectra, 14 of which are part of the LP sample. Pixels within 
the Lyman-a forest within a given redshift range are extracted and 
combined into a histogram. We will refer to thes e published values 
as the 'LUQAS' data. The transmission PDFs of IKim et al] d2007l) 
are avera ged over the same re dshift ranges as the LP ones. 

• The ICalura et alj d2012h sample is used to investigate the 
TPDF at redshift z ~ 3. Their results are split in two bins, 2.62 < 
z < 3.17 and 3.17 < z < 3.72. We consider the first bin only to 
be compared to the other determinations. The absorption distance 
in this bin, after removal of fourteen DLA and LLS regions, is about 
4.5. We use their estimate of the TPDF without metals and LLS. 

• The MOO sample is a set of 8 Keck HIRES spectra with res- 
olution a nd signal-to-noise simi lar to the UVES data, and is de- 
scribed in IMcDonald et"alld2000r) . hereafter MOO. They use slightly 
different redshift bins that do not cover our lowest redshift bin, and 
go up to z — 4 A3. We will therefore only consider their two lower 
redshift bins: 2.09 < z < 2.67 (33791 data pixels, AX ~ 3.5) 
and 2.67 < z < 3.39 (31897 data pixels, AX ~ 3.7). 



Table 1. The mean transmission PDF of 18 UVES Large Program (LP) 
QSOs, in three redshift bins (1.88 < z < 2.37, 2.37 < z < 2.71 and 
2.71 < z < 3.21). The error is the 2a variance among mock GIMIC sam- 
ples with ensemble average mean transmission (_F)=0.86, 0.77 and 0.71, 
respectively. 
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Noise and errors in the continuum fitting can make the trans- 
mission F < or F > 1. To compute the PDF of the transmission 
for the LP sample, we use the same binning as used in the LUQAS 
and McDonald et al. (2000) analyses, i.e. bins of width 0.05 be- 
tween F — 0.025 and F = 0.975, plus extra bins for those pixels 
with F < 0.025 and F > 0.975. The PDF is then normalised^ 
such that the sum of all values in all bins equals 20. The full covari- 
ance matrix of errors o n the PDF is estim ated using the jack-knife 
technique described in iLidz et al] d2006j) . but applied to the flux, 
while they applied this technique to 5/ = (F — F)/F. Specifi- 
cally, we estimate the PDF P{Fi) from the full data sample, di- 
vide the data set into 30 different subgroups, then estimate the PDF 
of the data sample omitting each subgroup iteratively, Pk(i). The 
variance cr^j is then computed on the difference between P(Fi) 

fc=30 

and P k (Fj ) : af tj = £ [P(F t )~ P k (F*)] {P(Fj ) - P k {F, )] . For 

k=l 

the other observations we use error bars taken from the correspond- 
ing references. We discuss below how errors can be more reliably 
estimated as the variance among mock GIMIC samples. Both es- 
timates of errors are shown in Fig. [3] while Table Q] indicates the 
variance among mock GIMIC samples. 



1 The absorption distance dX/dz = (1 + z) 2 (fl m (1 + z) 3 + ^a)- 1 / 2 , 
and quoted numerical values of dX assume Q m = 0.25 and S7a = 0.75. 

2 Pixels with F < or F > 1 are assigned to the first and last PDF bins 
respectively, but the number of values in each bin is divided by the same 
AF = 0.05 bin width when normalising the histogram. 



2.2 Inconsistency between measured values of the mean 
transmission 

We comp are estimates of the mean transmission collect e d from the 
litera t ure dMcDonald etal]|200Cl: i Kirkman et a71l2005l : IKim et al.l 
l2007l : lFaucher -Giguere et al.ll2008i) . as well as measured by us for 
the LP sample. Errors are based on a bootstrap procedure, by re- 
sampling chunks of spectr a of size 5A, or on the varian ce among 
chun ks of the s ame s ize dFaucher-Giguere et alj |200"3 . hereafter 
FGl. lKim et"alld2007l) only provide errors on the effective optical 
depth, for a smaller bin in redshift dz — 0.2. We quote the corre- 
sponding errors on the flux of = F a T , and we compute bootstrap 
errors for the LP using the same bins in redshift. Estimates from LP 
and LUQAS are given in Table [2] (upper rows), with corresponding 
2 cr errors, scaled to the same absorption distance. 

The mean transmission values obtained from the LUQAS and 
LP samples differ by 2.13, 2.40 and 2.75 a at z = 2, 2.5 and 3, 
respectively (where a is obtained from adding the bootstrap errors 
from both samples in quadrature). We recall that the LUQAS and 
LP samples are mostly based on the same raw data, but that those 
data were reduced by different groups. These differences must 
therefore be due to systematic errors in the adopted procedures, 
in particular differences in co ntinuum fi t ting a nd the treatment of 
absorption from metals. Also. lKim et alj d2007l) concluded that the 
treatment of the data, in particular continuum fitting, leads to no- 
table dif ferences between auth ors. Published values for F from 
LUOAS. lKirkman et alj d2005l) and FG agree within 1 a at z = 2, 
but the differences increase at higher z. The most discrepant values 
are 2.49a at z = 2.5 (LUQAS versus IMcDonald et alJl2000L bot h 
are high-resolution data), and 3.9<r at z=3 jKirkman et alj 1 20051 
versus FG). 
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How reliable are the quoted errors ?|Kjm_etal estimate 
errors on the effective optical depth, — ln(F), by bootstrapping the 
LUQAS spectra in chunks of 5A. They do not mention convergence 
tests with chunk size for the error on the mean flux, but they do note 
that a modified jack-knife method, using 50 A chunks, yields errors 
that are too low - comparable to the estimated variance due to con- 
tinuum placement alone. They nevertheless use jack-knife errors 
with 50 A chunks to compute the variance of the transmission PDF. 
ICalura et al.l J2012h compare errors on the TPDF estimated with a 
bootstrap on 5 A chunks and with a jack-knife on 50 A chunks. 
They find similar results, but do not mention convergence tests with 
chunk size either. FG (2008) mention that "We have verified that the 
error estimates have converged for our choice of segment length", 
but they do not present quantitative results. 

Bootstrap errors depend on the arbitrary size of the chunks 
from which they are computed. Indeed, for the LP data at 
z — 2.5, we find variances in the mean flux of a = 
[0.25, 0.53, 0.78, 1.14, 1.03, 1.33, 1.15, 1.16] x 10 -2 for chunk 
sizes of [0.2, 1, 5, 25, 50, 125, 250, 625] A. Although a converges 

for very large chunk sizes ~ 25 A, as expected, we suggest that 
typical published errors based on 5 A chunks underestimate the 
variance by ~ 50 per cent. Note that the largest chunk size we 
tested, 625 A, is comparable to the extent of the Lyman-a forest in 
a z ~ 3 QSO. We discuss the reliability of bootstrap errors using 
GIMIC mocks further in Section|2~4lbelow. 



2.3 Mock samples 

We use t he GIMIC (Galax ies-Intergalatic Medium Interaction Cal- 
culation, |Crain!etj2[200l|) simulations, a set of smoothed particle 
hydrodynamics simulations (SPH) of five nearly spherical regions 
of co-moving ra dius R ~ 18/1 -1 M pc picked from the Millen- 
nium simulation JSpringel et aT]|2005h . The simulations have a gas 
particle mass of 1.4 x 10 6 h^ 1 Mg . These 'zoomed' simulations 
allows us to obtain high numerical resolution yet include the ef- 
fects of large-scale power, i.e. the simulation probes a range of 
environments, from massive clusters t o deep void s . The effect of 
large-scale structures, as discussed in iTvtler et alj | |2009|) . is thus 
accounted for. 

The GIMIC simulations were performed with the GADGET- 
3 cod e, an evolution of GADGET- 2 described last by ISpringell 
with modules for star formation, feedback from galac- 
tic winds, chemo-dynamics, and radiative cooling and photo- 
h eating due to an imposed evolving UV-background, as described 
in lSchave & Dalla Vecchial J2008I) : lDalla Vecchia & Schavej J2008h 
and W iersma et al. (2009b, a), respectively, see also Schave et al.l 
fcOld) . The assumed cosmological parameters are (f2 c dm + 
0. b ,n A ,0, b ,n s ,h,a 8 ) = (0.25,0.75,0.045,1,0.9,0.73,0.9). 
The five GIMIC regions are picked such that their over-densities at 
redshift z = 1.5 are (—2, —1, 0, 1, 2) times the root-mean-square 
deviation, a, from the mean on the spatial scale of the spheres. 
Re-ionization of HI is assumed to occur at z = 9, heating the 
IGM to T ~ 10 4 K, and of Hell at z = 3.5. As also shown by 
Wiersma et al. (2009b), the ev olution of Tp and 7 in the simulations 
is broadly consistent with the lSchave et alj j200Gh measurements, 
see also Fig.[T] For densities close to the mean, 7 ~ 1.3, and the 
temperature-density relation is never 'inverted'. 

We compute 1000 mock Lyman-a forest spectra by tracing 
straight lines through a cubt0 embedded well within each of the 



The cubes have sides ~ 1 1 h 1 co-moving Mpc which ensures we stay 
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Figure 1. Evolution of the parameters To and 7 of the temperature-density 
relation T = To (p/(p)) 7_1 , as measured by Schaye et al. (2000, black 
circles with error bars) and in the GIMIC simulation (blue connected dots). 
The temperature-density relation in GIMIC is broadly consistent with the 
measured values. Hell re-ionization causes the rise in To and the corre- 
sponding dip in 7 in the GIMIC simulations at redshift z ~ 3.2, but 
7 never drops below ~ 1.3. Red symbols are from the model of Bolton 
et al. (2008): filled squares are for their default model, open squares are 
for their model 20-256g3 that best fits the transmission PDF they inferred 
from LUQAS. This model has an inverted temperature-density relation, i.e. 
7 < 1. 



five spheres, extracting density, temperature and peculiar velocity 
along them, an d then computing the corresponding op tical depth 
as described in lTheuns et alj dl998l) . ICrain et all d2009l) explain in 
their appendix how to combine results from individual spheres to 
correctly reproduce statistics valid for the full Millennium volume: 
we use the weights listed in their Table Al . Given these weights, we 
generate a 'mock' LP sample by randomly selecting spectra from 
each of the five spheres until the redshift path of mock and LP sam- 
ples are the same. We repeat this procedure 400 times to obtain a 
'suite' of mock samples. Note that every single mock sample in the 



we ll away from the edges of the spheres to avoid artificial boundary effects, 
sec ICrighton etailfeOlOh for details. We will call a Lyman-a spectrum ob- 
tained from a single cut through the cube a short spectrum. 
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suite has the same redshift path as the LP sample. Each spectrum 
is convolved with a Gaussian to match the UVES spectral reso- 
lution, re-binned to the UVES pixel size, and we add noise with 
similar statistical properties as measured in the observed spectra. 
Our results do not change significantly if we only use the GIMIC 
mean density sphere. We can compute flux statistics for a given 
mock sample simply from all pixels in all short spectra that make- 
up the mock sample. However, when computing bootstrap errors 
below, we combine these short spectra into a Lyman-a spectrum 
that mimics the full absorption distance of a given LP spectrum. 

It is difficult to accurately mimic the effect of 'continuum fit- 
ting' as applied to observations to the simulated samples, because 
the wavelength range over which the observed continuum is sup- 
posed to vary is large compared to the size of an individual simu- 
lated spectrum. In the observations, the true and estima ted continua 
are thought to differ by about 1-3 per cent (see e.g. lAracil et"aTI 
|2004| ; IFaucher-Giguere et alj|2008l) . Therefore, to investigate plau- 
sible continuum uncertainties, we compare statistics from the orig- 
inal samples to those in which we multiply the flux by a constant 
factor of 1.02 to mimic a 2 per cent systematic offset between 'true' 
and 'fitted' continua. 

The Lyman-a optical depth in a spectrum depends on the 
evolving photo-ionization rate, 

T = 4n [ ^l ffv dv = r*i2 10" 12 s _1 , (1) 
/ hv 

J l/rp 

where J(v) is the mean intensity of the ionising radiation at a given 
redshift, vt is the frequency of the Lyman limit, a v is the hydrogen 
photo-ionization cross section. Within a suite of mock samples we 
use the same value for Ti2, and will refer to the 'ensemble average' 
mean transmission of the suite as (F) . The mean transmission, F, 
of a given mock sample can differ significantly from the ensemble 
average (F) of the corresponding suite because of 'sample vari- 
ance' and the same is true for its PDE We estimate the sample 
variance in a given suite by comparing all 400 mock samples that 
make-up the suite. We emphasize that because the simulated sam- 
ples keep probing the same density field, the real dispersion is likely 
to be larger than this estimate. 

The value of the photo-ionization rate Fi2 is uncertain. 
iTheuns et alj i ll 9981) show that in the optically thin case, simula- 
tions can be run with one value for Ti2 and later accurately scaled 
to another value. To investigate the effect of uncertainties in T12, 
we generate many suites of mock samples, with different values of 
Ti2 and hence of the ensemble average transmission, (F). 



2.4 Estimates of errors with mock samples 

We can check the reliability of the bootstrap errors discussed 
in Section 12.21 using GIMIC mock samples. We first examine 
whether mocks generated from the simulation give the same er- 
rors on the mean flux as observed samples when the err ors are esti- 
mated in the same way. IFaucher-Giguere et alj d2008l) divide the 
variance Oi of the mean flux measured along chunks of 3 Mpc 
proper size, by the square root of the number of chunks. They find 
ai = [0.13,0.11,0.09] at z = [3,2.4,2], with 193, 263 and 50 
chunks respectively. Applying this procedure first to the LP data, 
we find <n = [0.125, 0.13, 0.095] at z = [3, 2.5, 2], with 37, 262 
and 413 chunks respectively. Applied to our mocks we find <Ji = 
[0.14, 0.14, 0.11]. Therefore both our analysis of the LP observa- 
tions, and of the GIMIC simulations, give error estimates in rea- 
sonable agreement with those obtained by IFaucher-Giguere et alj 



d2008h - 1 Kim et all d2007f) estimate errors on the effective optical 
depth, — ln(-F), by bootstrapping the LUQAS spectra in chunks 
of size 5 A. We concentrate on their estimate at z — 2.59 with a 
bin in redshift of Az = 0.2, corresponding to a velocity path of 
88682 km s _1 . We use the GIMIC simulations to generate many 
mock versions of the LUQAS sample, each with the same veloc- 
ity path, and estimate the variance a for the same chunk size. The 
average value for our mocks is <7f = F <r T — 0.0124, identi- 
cal to their bootstrap error. Finally, we compare errors estimated 
from GIMIC against our own bootstrap errors obtained from the 
LP data, as discussed in the previous section. At z = 2.5 and 
for a velocity path of ~ 190000 km s -1 ), we calculate bootstrap 
variances of a = [0.26,0.54,0.80,0.98,1.22,1.15] x 10~ 2 for 
chunk sizes of [0.2, 1, 5, 25, 125, 625] A for the simulated mocks, 
as compared to a = [0.25, 0.53, 0.78, 1.14, 1.33, 1.16] x 10~ 2 for 
the LP observational data. We conclude that errors computed from 
GIMIC mocks are in excellent agreement with published errors, as 
well as errors obtained by us from the LP data, when simulated and 
observed errors are calculated in the same way. 

The bootstrap errors discussed above clearly depend on the 
value of the chunk size for which they are computed, both for the 
data and for the simulated spectra. They start to converge for rela- 
tively large chunk sizes of ~ 25 A, although the convergence is not 
yet clearly reached. Using simulations we can also calculate the 
variance between different mock samples: simply generate many 
mock samples for a given simulation, each with the same redshift 
path as a given observed sample, and evaluate the variance between 
mock samples. This variance is [0.55,0.88, 1.7] x 10 -2 at red- 
shifts z — [2, 2.5, 3], as compared to bootstrap errors using 25 A 
chunks of [0.50, 0.98, 2.1] x 10 -2 , in reasonable agreement. Given 
the dependence of the variance on chunk size for small chunks, we 
will use the variance between mock samples to characterise the ex- 
pected level of scatter in the data and to investigate the consistency 
between simulation and data. We suggest that error estimates that 
we obtain from determining the variance between mocks, are more 
realistic than the published, observed bootstrap errors. 



3 THE TRANSMISSION PDF 

We have computed the transmission PDFs of the LP s ample over 
the same small redshift ranges as used by iKim et all d2007t) . Be- 
cause these redshift ranges are relatively narrow, evolution over 
them can be safely neglected, and hence we simply use simulation 
snapsho ts at a single red shift (z ~ 2, 2.5 and 3 for the three bins 
used by IKim et aU2007f) when comparing to the observed data. 

3.1 Variance of the transmission PDF 

Fig.|2]illustrates that continuum fitting quite noticeably affects the 
transmission PDF near F ~ 1, and comparison to the over-plotted 
data also suggests that uncertainties in continuum placement can 
explain the large differences in the observed PDFs at F ~ 1. Re- 
call that we mimic the errors in continuum fitting by a systematic 
shift in the continuum (Section 2.2). Clearly, given these uncer- 
tainties, this part of the TPD F cannot constrain models robustly 
(see also lMeiksin et alj|200ll) . Fortunately, the distribution of pix- 
els with F < 0.7, say, is relatively insensitive to the error in the 
continuum placement for high-resolution spectra and can thus be 
used to constrain the mean transmitted flux. 

The GIMIC simulations that best reproduce the observed 
transmission PDFs for F < 0.7 have ensemble averaged mean 
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Figure 2. Effect of 'continuum fitting' the GIMIC simulations, solid curves 
show the 2(j range in the transmission PDFs of a sample of mocks with 
given ensemble averaged transmission, (F) . When errors in continuum fit- 
ting are mimicked by a systematic shift in the continuum (see Section 2.2), 
the range is enclosed by full lines. Continuum fitting makes the shape of 
the TPDF uncertain close to F = 1. Note that we only show the range 
0.6 ^ F $ 1. For F < 0.6, we find that the continuum correction is small 
compared to the 2cr range. Symbols with error bars are the data from the LP 
sample (red); LUQAS (blue), McDonald et al. (2000) (black) and Calura 
et al. (2012) (green). These also show significant differences in the range 
F > 0.7, plausibly due to the different continuum fitting methods applied 
in the data reduction. 

transmissions of (7 ? )=0.86, 0.77 and 0.71 at redshifts z=2, 2.5 and 
3, respectively, as discussed in more detail below. Observed and 
mock TPDFs with these values of (F), are compared at z = 2, 2.5 
and 3 in Fie. [3] Light (dark) shaded regions show the la and 2a 
dispersiorj 4 ] among TPDFs of this particular suite of mocks. There 
is considerable variance between the transmission PDFs of mock 
realisations, even though each mock realisation is generated from 
the same simulation with the full absorption distance of the LP ob- 
served sample. 

The variance in the mocks increases with redshift since the 
redshift path decreases. The ratio of variance computed from 
GIMIC mock versus jack-knife variance is shown in Fig. [4] Ex- 
cept at z = 2.5, variance in mocks is systematically larger, from 
10 to 50% at z = 2 and up to 100% at z = 3. Given that the 
simulations, if anything, underestimate sample variance, suggests 
once more that the observationally determined jack-knife errors are 
too small. Although more difficult to assess from other works, we 
found that the estimates of errors using the jack-knife method is 
very unstable given the relatively small size of the sample. We will 
therefore quote variances computed from our mocks only. 

The LP and LUQAS data fall well within the 2a region at all z 
for F < 0.7, with a possible exception of the F ~ bin at z — 2. 
It is possible that the latter discrepancy is due to the fact that simu- 

4 They correspond to the 2.275, 15.8655, 84.13 and 97.725 percentiles 
computed from 400 realisations. 
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Figure 3. Top to bottom: PDF of the transmission at z = 2, 2.5 and 3, 
of the best-fitting simulations (continuum fitted GIMIC simulation:.vo/('rf 
curve; Bolton et al. 2008 model 20-256g3 shown as open squares in Fig 1 : 
dashed cun'e), compared to observational data (symbols with error bars, 
LP sample: red; LUQAS: blue; Calura et al.: green; MOO: 2=2.41 and 3.0, 
black). Error bars are lcr jack-knife errors for LUQAS, LP and Calura et 
al., and bootstrap of 5A chunks for MOO. Light (dark) shaded regions cor- 
respond to the 1 and 2a range computed from 400 mock LP samples in 
GIMIC simulations with redshift and ensemble averaged mean transmis- 
sion (F) as indicated in each panel. The simulations and various data sets 
agree well within the 2cr range at all three redshifts. Insets show (model- 
data)/cr , where model is the best-fitting PDF for GIMIC, data and a are 
the LP PDF and the variance estimated in GIMIC simulations. The GIMIC 
simulations fit the data for F < 0.7 even though 7 > 1 at all z. For 
F > 0.7 and z ^ 2.5, different data sets are inconsistent and sensitive to 
continuum fitting (missing points are above 4). 
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Figure 4. Ratio of PDF variance computed from GIMIC mocks to vari- 
ance computed from jack-knife method for bins in transmitted flux in three 
different bins in redshift. 

lations that assume the gas to be optic ally thin do not rep roduce the 
observed number of strong lines (e.g. lTvtler et al Including 
self-shielding appears to solve this issue jAltav et alj20lTh . The LP 
and LUQAS samples results are almost identical in bins where un- 
certainties in the position of the continuum does not in terfere in the 
TPDF . They are also very similar to the results from lCalura et al .1 
j2012l) sample that has one quasar in common (which makes one 
fourth of th e total sample in this red shift bin). They also agree with 
results from lMcDonald et alj feOOCh within the 2a range estimated 
from the simulations. 

The difference between the best-fitting simulated PDFs in 
GIMIC mock samples (among different values for Fi2 only) and 
our determination of the TPDF from the LP, divided by la range 
on mock LP TPDF in GIMIC simulation, is shown in the bottom 
of each panel in Fig. [3] There is no evidence that the observed and 
simulated GIMIC PDFs are inconsistent at any redshift. The statis- 
tical interpretation of this measurement, and the derived constraints 
on the ionising background rate, are discussed further in Section|4] 

3.2 Variance of the mean transmission 

Interestingly, observations as well as simulations show large 
quasar-to-quasar variations in the mean transmission at a given 
redshift. To illustrate the origin of this large scatter, we analyse 
400 mock samples from GIMIC generated with a given ensem- 
ble average, (F) = 0.79, at redshift z = 2.5. The large scatter 
is due to strong absorption lines, which contribute significantly to 
the mean opacity: the small number of strong lines per QSO spec- 
trum introduces the ob served scatter, as we now show (see also 
iDesiacques et alj|2007t) . 

We have used a simple criterion to identify 'lines' in the spec- 
trum as regions between two maxima in F; we also demand that the 
corresponding minimum is sufficiently different from the lowest 
maximum to avoid identifying noise features as lines. More specif- 
ically, this algorithm identifies all local minima and maxima on 
a spectrum smoothed with a Gaussian kernel of width 8 km s _1 . 
A line consists of all pixels between two maxima that satisfy the 
following two conditions: (i) two successive maxima must be sep- 
arated by more than 8 km s _1 and (ii) the flux difference between 
the maxima and the minimum they straddle must be larger than four 
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1 2 3 4 5 

w cut (A) 

Figure 5. Mean transmission of spectra that include all lines with equivalent 
width W < Wcut for the LP sample (red dots), and the corresponding 1 
and 2a range in this quantity estimated from GIMIC mock samples (grey 
and dark regions, respectively). The net mean transmission values, F, for 
the LUQAS, MOO, Faucher-Giguere et al. (2008; FG) and Kirkman et al. 
(2005) data are indicated by horizontal lines (FG and LUQAS values of F 
are identical). There is significant scatter in F of the GIMIC samples when 
Wcut ~ 1.5 A, but as strong lines are excised, the dispersion decreases 
significantly. This shows that strong lines are mostly responsible for the 
scatter. The obsen'ed values of the net mean transmission, F(W < 00), 
are well within the 2a range estimated from the GIMIC simulations. 



times the estimated error per pixel. Each pixel is then assigned to 
a line, with given equivalent width, W, We can now compute the 
mean transmission in a mock sample (or the LP data) for all pixels 
in lines with W less than some maximum equivalent width, W C ut- 

The mean transmission, F(W C ut), for all pixels in lines 
weaker than a given value of Wcut is plotted as a function of W cu t 
in Fig.f5]as red dots for the LP sample, with grey and dark regions 
the 1 and 2a range estimated from the mock GIMIC samples. For a 
high cut in W, all pixels are used and F(W cut = 00) is simply the 
net me an transmission F; we also indicate F from LUQAS, MOO, 
FG and lKirkman et all j2005t) . 

For mock samples with ensemble average (F) — 0.79 we find 
that the (continuum fitted) F(W cu t = 00) varies between 0.79 and 
0.84 within 2a. Note that our procedure to estimate the errors due to 
'continuum fitting' makes the mean transmission, F systematically 
higher than (F). Observed determinations of the mean transmis- 
sion are shown with horizontal lines in the figure. It appears that, 
despite the large dispersion amongst observed values, they are nev- 
ertheless consistent, because the expected sample variance, as in- 
ferred from GIMIC (and consistent with bootstrap estimates using 
real data for sufficiently large chunk size), is so large. The origin of 
the large variance is the presence of strong lines. 
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Figure 6. Dependence of the transmission PDF on the ensemble averaged 
(F) at z = 2.5. The dark shaded region shows the 2a range computed from 
400 mock samples in a GIMIC simulation with (F) = 0.77 as in Fig. [5] 
Symbols with error bars are as in Fig. [5] Solid and dashed hashed regions 
correspond to the 2a range in GIMIC simulations with (F) = 0.83 and 
0.74, respectively. At these extremes the observational data (for transmis- 
sion 0.1 < F < 0.7) falls just outside the 2<7 range of the simulation for at 
least one data point. 
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Weut (A) 

Figure 7. Same as Fig. [6] but for the dependence of the mean transmission 
as a function of maximum line width, F(W C ut)- Dark shaded region is the 
2<r range for (F) = 0.79, solid and dashed hashed regions correspond 
to the 2a range in the GIMIC simulations with (F) = 0.81 and 0.75, 
respectively. 
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Figure 8. Reduced x 2 as a function of the ensemble averaged {F) at 
2 = 2, 2.5, 3.0 (top to bottom). The covariance matrix is measured us- 
ing the variance among GIMIC mock samples. x 2 corresponds to the 
difference between one TPDF and the averaged TPDF from 400 GIMIC 
mock samples assuming different (F). As a validity check, the TPDF mea- 
sured in one GIMIC mock sample with (F) = 0.86,0.77 and 0.71 at 
2 = 2, 2.5, 3.0 respectively, is best fitted with the same value for (F) 
(dotted lines show the average reduced x 2 an d me 1°" range among 400 
samples). The evolution of the reduced x 2 as a function of {F) is similar in 
the case of the observed LP (solid lines). 
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Figure 9. Mean hydrogen photo-ionization rate, T, as function of redshift, 
from summing over sources as computed by Haardt & Madau (2001, red) 
and Faucher-Giguere et al (2009, drawn orange line), and from comparing 
simulated to observed mock spectra. Blue and green points are our (2<r) 
determinations from comparing, respectively, the TPDF and the mean flux 
in the GIMIC simulations to the LP data, orange symbols are the Faucher- 
Giguere et al. (2008) determination using a sample of 84 high-resolution 
quasars. 



Table 2. Upper rows: Measured value of the mean transmission in three red- 
shift ranges for LUQAS and LP samples. Using the LP as a reference, the 
redshifts ranges are 1.88 - 2.37, 2.37 - 2.71, 2.71 - 3.21 with absorption 
distance of 10.3, 5.8 and 2.9 respectively. For LUQAS, Kim et al. (2007) 
provide errors computed by bootstrapping chunks of size 5A within bins of 
size dz = 0.2 ; their errors are then rescaled to the LP absorption distances. 
The errors given for LP correspond to the variance between GIMIC mock 
samples. Lower rows: ensemble-averaged (F) in GIMIC simulations that 
reproduce within 2cr the LP observed transmission PDF and mean trans- 
mission, F; the last row gives the ionising background rate values in the 
same GIMIC simulations. {F) refers to an ensemble average, F refers to 
a single realisation of such an ensemble, and is generally larger than (F) 
because it includes a 2% continuum fitting offset. 



z = 2.0 


z = 2.5 2 = 3.0 


Measured F (±2cr) 


0.887± 0.011 


0.812±0.017 0.780±0.034 LP 


0.868± 0.010 


0.775±0.021 0.713±0.032 LUQAS 


Derived (F) with 2cr variance from GIMIC mocks. 


n Rfi+ a007 

U - 80 -0.025 

0.85±0.02 


0.77+ '°^ 0Jl + l° 6 5 (fromTPDF) 

0.79+°^ o.ntEl ( from ^) 


Derived Ti2 with 2o range from GIMIC mocks. 


1.3(0.9, 2.0) 


1.2(0.8,1.8) 1.3(0.6,2.6) 



4 CONSTRAINTS ON THE MEAN TRANSMISSION AND 
THE INTENSITY OF THE IONISING BACKGROUND 

The photo-ionization rate can be estimated by scaling mock spec- 
tra obtained from simulations to the observed mean transmission 
F, and calculating the corresponding value of Fi2. To determine 
the range of Ti2 values consistent with the observed F, we need 
some measure of the expected variance of F around its ensemble 
average (F). In principle, it should also be possible to use the full 
transmission PDF rather than just its mean. 

To judge how well a given realisation of a mock transmission 
PDF fits an observational determination, one could use the usual 
X 2 -estimator for values of the transmission between 0.1 and 0.7. 
A covariance matrix can be computed by cross-correlating esti- 
mates of th e TPDF from a la rge number of bootstrap samples, as 
described in lLidz etaT, Note that all bootstrap samples are 

then by construction sub-samples of the observed spectra, which 
limits their usefulness if the observed path length is small. When 
this is applied to the transmission PDF, it transpires that the covari- 
ance matrix is nearly singular and hence needs to be 'regularized' 
using a singular value decomposition. We found that the values ob- 
tained for x 2 then depend strongly on the number of singular val- 
ues regularised, which severely compromises the usual statistical 
interpretation of \ 2 ■ We can get around this problem by using the 
simulations to estimate the variance on either F or the transmission 
PDF, for samples with given (F). 

However, we have seen that the value of the mean transmission 
F for a given realisation of a mock sample can differ considerably 
from the ensemble average (F) of the sample. Since the observa- 
tions only provide a single measurement of F, a potentially large 
range of ensemble averages are consistent with that F. This is il- 
lustrated in Fig. [6] for the transmission PDF, and in Fig. [7] for F, 
both at redshift z = 2.5. In both cases the dark grey band shows 
the 2a range in mock samples drawn from simulations with a given 
value of the ensemble averaged transmission ((F) = 0.77 and 0.79 
respectively). As before, each sample has the same redshift path as 
the LP sample. 

Considering first the mean transmission as a function of line- 
width, we demand the mean transmission with W — oo to fall 
within the 2a region. We interpret these extreme values as 2 a lim- 
its on the ensemble average (F). The 2 a allowed range is then 
0.75 ^ (F) ^ 0.81. As before, the determination of F in the 
mock sample is done after 'continuum fitting', which implies that 
F will be systematically higher than (F). Performing the same 
analysis at z — 3 and at z = 2. yields a 2 a allowed range of 
0.62 ^ (F) ^ 0.78 and 0.83 < (F) < 0.87 respectively (Ta- 
bleO. 

To do a fit of the TPDF requires a measure of the covari- 
ance matrix. As explained above, data samples are not yet large 
enough to provide a reliable estimate of it. Rather, we compute the 
covariance using 400 independent determinations of the TPDF in 
GIMIC mock samples. The covariance matrix can thus be inverted 
without further regularization. We use 13 bins for a range of flux 
0.1 < F < 0.7, corresponding to k — 12 degree of freedoms. The 
evolution of the reduced Xr = (x 2 ~ fc)/\/(2A;) is shown in Fig. [8] 
(solid lines). To check the validity of this procedure, we derive the 
same evolution for different mock samples. Assuming a true value 
of <F) trU c (0.71, 0.77 and 0.86 at z =2, 2.5 and 3 respectively), 
we compare again 400 mock samples with different value of (F) 
to the average TPDF with {F) t rue, and compute the associated re- 
duced Xr- The average evolution of Xr an d its dispersion (dotted 
lines in Fig. [8]l are consistent with the observed evolution using 
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the LP TPDF, despite a slight tension at z = 2.5. We provide a 
best fitting value and a 2a range for (F) using the smooth average 
evolution in GIMIC samples: 0.845 < (F) = 0.86 < 0.877 
at 2 = 2.0, 0.745 < (F) = 0.77 < 0.795 at z = 2.5 and 
0.66 «S (F) = 0.71 < 0.77 at z = 3.0. Note that the best fit- 
ting value for (F) is slightly shifted compared to the value corre- 
sponding to the observed minimum, in order to best reproduce the 
overall evolution of Xr- Also, the range at z — 2.5 as determined 
from the evolution of \ 2 is narrower than the range determined by 
eye in Fig. [6] Those estimates for (F) and their 2 a uncertainty at 
these three redshifts can be compared to the values given in Tablef2] 
that refer to the allowed range of {F) so that GIMIC simulations 
reproduce within 2a the LP observed transmission PDF (Fig. [6](. 
Our values are generally in agreement with previously published 
values, but our quoted uncertainties are significantly larger. 

Given the constraints on (F), we can use the simulations to in- 
fer the corresponding range in photo-ionization rates T(z), which, 
in addition to the inferred value of (F), depend on the baryon den- 
sity, fif,, the temperature-density relatio n, the fluctuation a mplitude 
as and other cosmological parameters dRauch et all 19971) . 

Our inferred values for the photo-ionization rate, T(z), are 
compared in Fig. [9] to the results of Haardt & M adau (2001) 
and to those of iFaucher-Giguere et alj J2008I. |2009l), and are also 
listed in Table [2] The red dHaardt & Madaul 120011) and orange 
dFaucher-Giguere et al.ll200"9T) curves combine observationally in- 
ferred values for the emissivities of sources of ionising photons 
with an assumed escape fraction and a model for t he mean free path 
based on observations to estimate T. Note that lHaardt & Madaul 
d201ll) derived recently a lower value of F ~ 0.9 10 _12 s _1 for 
2 < z < 3. In agreement with these models, we find little evi- 
dence for evolution in V over the red shift range z =2-3 . This is 
also in agreement with the results of iBolton et al.1 d2005t , their 
Figure 7), although our error bars are again larger for z = 2.5 
and 3. Our value for the amplitude is in good agreement with that 
from Haardt & Madau, but is a factor of ~ 2 larger than that of 
Faucher-Giguere et al. (2009). The latter value is not inferred from 
si mulations, but from a fit to t he density distribution of the IGM 
by Miralda-Escude et al. I d2000l) . itself guided by older simulations 
of lMiralda-Escude et alj dl996h The significant differences in cos- 
mological parameters of those simulations mig ht explain the signif- 
icant offset in the inferred amplitude. Indeed, IPawlik et al.l d2009h 
found that the Miralda-Escude et al. fit did not describe their own 
simulations well. 



5 DISCUSSION AND CONCLUSIONS 

We have compared the mean transmission, F, as well as the 
transmission probability distribution function, TPDF, in the H I 
Lyman-a forest as derived from several observational samples, 
as well as from mock samples computed using the GIMIC suite 
of hydrodynamical simulations. The mean transmission F in the 
Lyman-a forest varies considerably from QSO to QSO, even at a 
given redshift. We have shown that, both in data and in simulations, 
this is due to the presence of strong lines, which, though relatively 
rare, contribute significantly to the opacity. This implies that a large 
redshift path is required to accurately determine the mean transmis- 
sion. 

We have compared in detail the variance a on F between pub- 
lished data, our own analysis of the observed UVES LP sample, 
and mocks computed from the GIMIC hydrodynamical simula- 
tions. We have shown, from observations only, that bootstrap er- 



rors depend sensitively on chunk size, and only start to converge 
when relatively large chunks, ~ 25 A, are used. This is larger 
than typically used, and as a consequence we claim that published 
errors may be slightly underestimated, especially at larger redshift. 
We compared the mean transmission computed from the GIMIC 
simulations to that obtained from three observational samples. The 
GIMIC simulations are zoomed simulations of different density 
regions picked from the Millennium simulation, and as such they 
have a realistic amount of 'sample variance'. We exploited this fea- 
ture of the simulations to estimate the uncertainty in the determi- 
nation of (F) for various observed samples. When we compute 
errors in the same way as performed in published work, we find 
excellent agreement between published and predicted values. We 
have also shown that converged bootstrap errors are in good agree- 
ment with errors found from bootstrapping mock samples. Thus, we 
find larger uncertainties than in previous works. For a given value 
of (F), the variance on the mean transmission is large enough to 
make all previously published values consistent within the scatter. 

Using mock spectra derived from GIMIC, we have investi- 
gated the dependence of the variance of the mean transmitted flux 
on the absorption path AX , see Table[3] At z = 2.5, with a sample 
twice as large as the LP sample, the 2a variance is only 0.013 and 
decreases down to 0.009 with a sample four times as large, which 
is half of the value for 2a for one LP sample, as expected. We note, 
however, that the size of our simulations may not be sufficient to 
evaluate the variance with such a large velocity path, especially at 
z = 2. 

We have also investigated the probability distribution of the 
transmission. The ensemble variance between mock samples is sys- 
tematically larger than the jack-knife errors used by previous au- 
thors, by a factor of 1.5-2 in the redshift bins z — 3. More impor- 
tantly, the covariance matrix derived from a suite of mocks can be 
inverted without regularization, contrary to standard estimate with 
jack-knife methods. We used these larger errors and compare data 
to simulations. 

The temperature-density relation, T = To (p/(p)) 7 ~ 1 , in the 
GIMIC simulations is a result of adiabatic cooling and photo- 
heating due to an impo sed ionising-background as computed by 
lHaardt & Madaul d200lh . tweaked to yield values for Tp an d 7 con- 
sistent with the measured values of ISchave et all d2000h . In this 
model 7 > 1 at all times, with a minimum val ue of 7 ~ 1.3 aroun d 
redshift z = 3 caused by Hell re-ionization dTheuns et alj|2002h . 
The GIMIC transmission PDF is in agreement with that measured 
from high-resolution quasar spectra over the redshift range z =2- 
3 in the transmission range 0.1 < F < 0.7. For F < 0.1 there 
may be differences due to the neglect of self-shielding in the simu- 
lations, whereas for F > 0.7 uncertainties in continuum fitting the 
data complicate the comparison. This agreement is obtained using 
a specific set of cosmological parameters. In particular, we assume 
as = 0.9. The goal of this work is not to provide the best fit- 
ting cosmological model, but to point out the large effect of sample 
variance. Indeed, our model with (as,"/) = (0.9, 1 ) is not ruled 
out by the current set of data, while lViel et alj d2009l) discard those 
values at more than 2a when considering the whole flux range. 
Thus, we argue that previous suggestions for an inverted T-p rela- 
tion may have resulted from an underestimate of the errors in the 
observations, rather than a discrepancy between data and the stan- 
dard model. 
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Table 3. Dependence of the variance of the mean transmission on absorp- 
tion distance AX, for three redshifts. The top row shows the variance (2cr) 
for the current LP sample (with given absorption distance AX LP). The 
second and third rows are for samples two, and four times as large. Errors 
correspond to the variance among mock LP samples. 



z = 2.0 


z = 2.5 


z = 3.0 




AX LP 10.5 


5.8 


2.9 


sample size 


0.011 


0.017 


0.034 


AX X 1 


0.0078 


0.013 


0.024 


AX x 2 


0.0054 


0.0088 


0.017 


AX x 4 



ACKNOWLEDGEMENTS 

We thank the anonymous referee for useful comments that im- 
proved the quality of the paper. We would like to thank our col- 
laborators to allow us to analyse the GIMIC simulations for this 
purpose. These simulations were carried out using the HPCx fa- 
cility at the Edinburgh Parallel Computing Centre (EPCC) as part 
of the EC's DEISA 'Extreme Computing Initiative', and with the 
Cosmology Machine at the Institute for Computational Cosmology 
of Durham University. This work was supported by an NWO VIDI 
grant and by the Marie Curie Initial training Network CosmoComp 
(PITN-GA-2009-238536). 



REFERENCES 

Altay G., Theuns T, Schaye J., Crighton N. H. M., Dalla Vecchia 

C, 2011, ApJ, 737, L37 
Aracil B., Petitjean P., Pichon C, Bergeron J., 2004, A&A, 419, 

811 

Becker G. D„ Bolton J. S„ Haehnelt M. G., Sargent W. L. W., 

2011, MNRAS, 410, 1096 
Becker G. D., Rauch M„ Sargent W. L. W., 2007, ApJ, 662, 72 
Bergeron J., Petitjean P., Aracil B. et al,., 2004, The Messenger, 

118, 40 

Bi H. G., Boerner G., Chu Y, 1992, A&A, 266, 1 
Bolton J. S„ Haehnelt M. G., Viel M., Springel V., 2005, MNRAS, 
357, 1178 

Bolton J. S., Oh S. P., Furlanetto S. R, 2009, MNRAS, 395, 736 
Bolton J. S., Viel M., Kim T.-S., Haehnelt M. G., Carswell R. E, 

2008, MNRAS, 386, 1131 
Boyarsky A., Ruchayskiy O, Iakubovskyi D., 2009, Journal of 

Cosmology and Astro-Particle Physics, 3, 5 
Calura E, Tescari E., D'Odorico V., Viel M., Cristiani S., Kim 

T.-S., Bolton J. S., 2012, MNRAS, 422, 3019 
Carswell R. E, Webb J. K., Baldwin J. A., Atwood B., 1987, ApJ, 

319, 709 

Cen R., Miralda-Escude J., Ostriker J. P., Rauch M., 1994, ApJ, 
437, L9 

Chang P., Broderick A. E., Pfrommer C, 2011, ArXiv:astro- 
ph/1 106.5504 

Cowie L. L„ Songaila A., Kim T.-S., Hu E. M., 1995, AJ, 109, 
1522 

Crain R. A., Theuns T, Dalla Vecchia C. et al.., 2009, MNRAS, 
399, 1773 

Crighton N. H. M., Morris S. L., Bechtold J., Crain R. A., Jannuzi 

B. T, Shone A., Theuns T, 2010, MNRAS, 402, 1273 
Dalla Vecchia C, Schaye J., 2008, MNRAS, 387, 1431 
Desjacques V, Nusser A., Sheth R. K, 2007, MNRAS, 374, 206 



Fan X., Carilli C. L., Keating B., 2006, ARA&A, 44, 415 
Faucher-Giguere C, Lidz A., Zaldarriaga M., Hernquist L., 2009, 
ApJ, 703, 1416 

Faucher-Giguere C.-A., Prochaska J. X., Lidz A., Hernquist L., 

Zaldarriaga M., 2008, ApJ, 681, 831 
Fukugita M., Hogan C. J., Peebles P. J. E„ 1998, ApJ, 503, 518 
Gratton S., Lewis A., Efstathiou G., 2008, Physical Review D, 77, 

083507 

Guimaraes R., Petitjean P., Rollinde E., de Carvalho R. R., Djor- 
govski S. G., Srianand R., Aghaee A., Castro S., 2007, MNRAS, 
377, 657 

Gunn J. E., Peterson B. A., 1965, ApJ, 142, 1633 

Haardt F, Madau P., 2001, in Neumann D. M., Tran J. T. V, eds, 
Clusters of Galaxies and the High Redshift Universe Observed in 
X-rays Modelling the UV/X-ray cosmic background with CUBA 

Haardt F, Madau P., 2011, ArXiv: 1105.2039 

Hernquist L., Katz N., Weinberg D. H, Miralda-Escude J., 1996, 
ApJ, 457, L51 

Hu E. M., Kim T.-S., Cowie L. L., Songaila A., Rauch M., 1995, 

AJ, 110, 1526 
Hui L., Gnedin N. Y„ 1997, MNRAS, 292, 27 
Hui L„ Haiman Z., 2003, ApJ, 596, 9 

Kim T.-S., Bolton J. S., Viel M., Haehnelt M. G., Carswell R. F, 

2007, MNRAS, 382, 1657 
Kim Y.-R., Croft R. A. C, 2008, MNRAS, 387, 377 
Kirkman D., Tytler D., Suzuki N., Melis C, Hollywood S., James 

K, So G., Lubin D., Jena T, Norman M. L., Paschos P., 2005, 

MNRAS, 360, 1373 
Komatsu E., Dunkley J., Nolta M. R., Bennett C. L., Gold B., 

Hinshaw G., Jarosik N., Larson D., Limon M., Page L., Spergel 

D. N., Halpem M., Hill R. S., Kogut A., Meyer S. S., Tucker 
G. S., Weiland J. L., Wollack E., Wright E. L., 2009, ApJS, 180, 
330 

Lidz A., Faucher-Giguere C.-A., Dall'Aglio A., McQuinn M., 
Fechner C, Zaldarriaga M., Hernquist L., Dutta S., 2010, ApJ, 
718, 199 

Lidz A., Heitmann K, Hui L., Habib S., Rauch M., Sargent 

W. L. W., 2006, ApJ, 638, 27 
Lynds R., 1971, ApJ, 164, L73 
McDonald P., Miralda-Escude J., 1999, ApJ, 518, 24 
McDonald P., Miralda-Escude J., Rauch M. et al.., 2000, ApJ, 

543, 1 

McDonald P., Miralda-Escude J., Rauch M., Sargent W. L. W., 
Barlow T. A., Cen R., 2001, ApJ, 562, 52 

McDonald P., Seljak U., Buries et al.., 2006, ApJS, 163, 80 

McDonald P., Seljak U., Cen R., Bode P., Ostriker J. P., 2005, 
MNRAS, 360, 1471 

McQuinn M., Hernquist L., Lidz A., Zaldarriaga M., 2011, MN- 
RAS, 415, 977 

McQuinn M., Lidz A., Zaldarriaga M., Hernquist L., Hopkins 

P. F, Dutta S., Faucher-Giguere C.-A., 2009, ApJ, 694, 842 
Meiksin A., Bryan G., Machacek M., 2001, MNRAS, 327, 296 
Miralda-Escude J., Cen R., Ostriker J. P., Rauch M., 1996, ApJ, 
471, 582 

Miralda-Escude J., Haehnelt M., Rees M. J., 2000, ApJ, 530, 1 
Mortlock D. J., Warren S. J., Venemans B. P., Patel M., Hewett 
P. C, McMahon R. G., Simpson C, Theuns T, Gonzales-Solares 

E. A., Adamson A., Dye S., Hambly N. C, Hirst P., Irwin M. J., 
Kuiper E., Lawrence A., Rottgering H. J. A., 201 1, Nature, 474, 
616 

Pawlik A. H., Schaye J., van Scherpenzeel E., 2009, MNRAS, 
394, 1812 



© 0000 RAS, MNRAS 000. 000-000 



12 Rollinde et al. 



Petitjean P., Mueket J. P., Kates R. E., 1995, A&A, 295, L9 
Petitjean P., Webb J. K., Rauch M., Carswell R. R, Lanzetta K., 

1993, MNRAS, 262, 499 
Puchwein E., Pfrommer C, Springel V., Broderick A. E., Chang 

P., 2011, ArXiv:astro-ph/l 107.3837 
Rauch M., 1998, ARA&A, 36, 267 

Rauch M., Miralda-Escude J., Sargent W. L. W., Barlow T. A., 
Weinberg D. H., Hernquist L., Katz N., Cen R., Ostriker J. P., 

1997, ApJ, 489, 7 

Ricotti M., Gnedin N. Y., Shull J. M., 2000, ApJ, 534, 41 
Rollinde E., Petitjean P., Pichon C, Colombi S., Aracil B., 

D'Odorico V., Haehnelt M. G., 2003, MNRAS, 341, 1279 
Rollinde E., Srianand R., Theuns T., Petitjean P., Chand H., 2005, 

MNRAS, 361, 1015 
Schaye J., 2001, ApJ, 559, 507 

Schaye J., Aguirre A., Kim T.-S., Theuns T., Rauch M., Sargent 
W. L. W., 2003, ApJ, 596, 768 

Schaye J., Dalla Vecchia C, 2008, MNRAS, 383, 1210 

Schaye J., Dalla Vecchia C, Booth C M., Wiersma R. P. C., The- 
uns T, Haas M. R., Bertone S., Duffy A. R., McCarthy I. G, van 
de Voort E, 2010, MNRAS, 402, 1536 

Schaye J., Theuns T, Rauch M., Efstathiou G., Sargent W. L. W., 
2000, MNRAS, 318, 817 

Springel V., 2005, MNRAS, 364, 1105 

Springel V., White S. D. M., Jenkins A., Frenk C. S., Yoshida N., 
Gao L., Navarro J., Thacker R., Croton D., Helly J., Peacock 
J. A., Cole S., Thomas P., Couchman H., Evrard A., Colberg J., 
Pearce R, 2005, Nature, 435, 629 

Theuns T, Leonard A., Efstathiou G., Pearce F. R., Thomas P. A., 

1998, MNRAS, 301,478 

Theuns T, Schaye J., Zaroubi S., Kim T.-S., Tzanavaris P., Car- 
swell B., 2002, ApJ, 567, L103 

Theuns T, Viel M., Kay S., Schaye J., Carswell R. E, Tzanavaris 
P., 2002, ApJ, 578, L5 

Theuns T, Zaroubi S., Kim T.-S., Tzanavaris P., Carswell R. E, 
2002, MNRAS, 332, 367 

Tytler D., Paschos P., Kirkman D., Norman M. L., Jena T, 2009, 
MNRAS, 393, 723 

Viel M., Bolton J. S., Haehnelt M. G, 2009, MNRAS, 399, L39 

Viel M., Haehnelt M. G, 2006, MNRAS, 365, 231 

Viel M., Haehnelt M. G., Carswell R. E, Kim T.-S., 2004, MN- 
RAS, 349, L33 

Wiersma R. P. C, Schaye J., Smith B. D., 2009, MNRAS, 393, 99 
Wiersma R. P. C, Schaye J., Theuns T, Dalla Vecchia C, Toma- 

tore L., 2009, MNRAS, 399, 574 
Zhang Y, Anninos P., Norman M. L., 1995, ApJ, 453, L57 



© 0000 RAS, MNRAS 000, 000-000 



